Signal processing device and signal processing method

ABSTRACT

Provided is a signal processing device including a display control unit for causing a display to display an image corresponding to a specified place, a sound-collection-signal input unit for inputting a sound collection signal of a sound collection unit that collects a user sound produced with microphones surrounding the user, an acoustic-signal processing unit for performing a first acoustic-signal process for reproducing a sound field where the user sound is sensed as if the sound were echoing in the place on the signal input by the sound-collection-signal input unit, based on a first transfer function measured in the place to indicate how a sound emitted on a closed surface inside the place echoes in the place and then is transferred to the closed-surface side, and a sound-emission control unit for causing a sound based on the processed signal to be emitted from speakers surrounding the user.

TECHNICAL FIELD

The present technology relates to a signal processing device that givesan excellent sense of immersion in a given place to users and a methodthereof.

BACKGROUND ART

In recent years, with respect to map information services provided onthe Internet and in application software, new services of displayingcombinations of photographs from satellites, displaying images which arerecorded by actually photographing views and states of streets on thegrounds at positions on a map, and the like, have been proposed inaddition to aerial-view maps that are expressed with figures symbol andthe like. Particularly, a service that uses image informationphotographed on the ground is very useful for checking a place that auser has not visited before.

On the other hand, sense-of-immersion technologies (immersive reality)that give a user (viewer) a feeling that “It feels just like I am inthat place” by covering his or her visual field have been widelystudied. Most of them are realized by placing the user himself orherself in the middle of a box-like place that is covered with five orsix faces (including the ceiling and the floor) on which images can bedisplayed (projected).

A sense of presence is considered to be obtained using such asense-of-immersion display, for example, on which an actual photographwhich is linked to the foregoing map information (for example, toperform a process of making a person life-sized) is displayed.

CITATION LIST Patent Literature

Patent Literature 1: JP 4674505B

Patent Literature 2: JP 4775487B

Patent Literature 3: JP 4725234B

Patent Literature 4: JP 4883197B

Patent Literature 5: JP 4735108B

SUMMARY OF INVENTION Technical Problem

In order to obtain a higher sense of presence and sense of immersion,however, a system for expressing spatial information in addition toimages is demanded.

The present technology takes these circumstances into consideration, andaims to provide a technology that can heighten a sense of immersion fora user more than when only image information is presented.

Solution to Problem

In order to solve the problem, according to the present technology,there is provided a signal processing device including

a display control unit configured to cause a necessary display unit todisplay an image that corresponds to a place specified from designatedposition information,

a sound collection signal input unit configured to input a soundcollection signal of a sound collection unit that collects a soundproduced by a user with a plurality of microphones disposed to surroundthe user,

an acoustic signal processing unit configured to perform a firstacoustic signal process for reproducing a sound field in which the soundproduced by the user is sensed as if the sound were echoing in the placespecified from the position information on the signal input by the soundcollection signal input unit, based on a first transfer function that ismeasured in the place specified from the designated position informationto indicate how a sound emitted on a closed surface inside the placeechoes in the place and then is transferred to the closed surface side,and

a sound emission control unit configured to cause a sound that is basedon the signal that has undergone the first acoustic signal process bythe acoustic signal processing unit to be emitted from a plurality ofspeakers disposed to surround the user.

In addition, according to the present technology, there is provided asignal processing method using a display unit, a sound collection unitthat collects a sound produced by a user with a plurality of microphonesdisposed to surround the user, and a sound emission unit that performssound emission with a plurality of speakers disposed to surround theuser, the method including

a display control procedure in which an image that corresponds to aplace specified from designated position information is caused to bedisplayed on the display unit,

an acoustic signal processing procedure in which a first acoustic signalprocess for reproducing a sound field in which a sound produced by theuser is sensed as if the sound were echoing in the place specified fromthe position information is performed on a sound collection signal ofthe sound collection unit, based on a first transfer function that ismeasured in the place specified from the designated position informationto indicate how a sound emitted from a closed surface side inside theplace echoes in the place and then is transferred to the closed surfaceside, and

a sound emission control procedure in which a sound that is based on thesignal that has undergone the first acoustic signal process in theacoustic signal processing procedure is caused to be emitted from thesound emission unit.

According to the present technology, an image that corresponds to aplace specified from designated position information is presented and asound field in which a sound produced by a user is sensed as if it wereechoing in the place specified from the designated position informationis provided to the user.

Here, in order to increase a sense of presence and a sense of immersion,the presence of a “sound” that expresses spatial information as well asan image is important. Thus, according to the present technology, asense of immersion for a user can be heightened more than when onlyimage information is presented.

Advantageous Effects of Invention

According to the present technology described above, a sense ofimmersion for a user can be heightened more than when only imageinformation is presented.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram for describing an overview of a reproductiontechnique realized in a signal processing system of an embodiment.

FIG. 2 is a diagram for describing a technique for sound fieldreproduction in an embodiment.

FIG. 3 is a diagram for describing an overview of a technique for soundfield reproduction of an embodiment.

FIG. 4 is a diagram for describing measurement techniques of transferfunctions for realizing sound field reproduction of an embodiment.

FIG. 5 is a diagram showing a plurality of speakers disposed in areproduction environment and their closed surfaces and a plurality ofmicrophones and their closed surfaces.

FIG. 6 is an illustrative diagram regarding a specific technique formeasuring a transfer function as Measurement 1.

FIG. 7 is also an illustrative diagram regarding the specific techniquefor measuring a transfer function as Measurement 1.

FIG. 8 is an illustrative diagram regarding a system configuration forperforming measurement of a transfer function.

FIG. 9 is a diagram showing an example of impulse response measurementdata.

FIG. 10 is an illustrative diagram regarding a configuration forsuppressing adverse influence derived from components other thanreverberant sound components (direct sounds or early reflection sounds).

FIG. 11 is an illustrative diagram regarding a specific technique formeasuring a transfer function as Measurement 2.

FIG. 12 is a diagram for describing a configuration of a signalprocessing system for realizing a signal processing technique as anembodiment.

FIG. 13 is an illustrative diagram regarding the content ofcorrespondence relation information.

FIG. 14 is a diagram showing a specific internal configuration exampleof a matrix convolution unit.

FIG. 15 is a flowchart showing the content of a process to be executedin this system to realize a reproduction operation as an embodiment.

FIG. 16 is a diagram showing a system configuration example in which arendering process of Technique 2 is set to be performed on a cloud.

FIG. 17 is a diagram exemplifying relations between a closed surfacethat is formed through disposition of speakers and a closed surface thatis formed through disposition of microphones in a reproductionenvironment.

FIG. 18 is an illustrative diagram regarding shapes of closed surfaces.

FIG. 19 is a diagram showing a case in which a closed surface formed byarranging microphones is set inside a closed surface formed by arrangingspeakers in a reproduction environment.

FIG. 20 is a diagram showing a relation between closed surfaces in ameasurement environment which corresponds to the case shown in FIG. 19.

FIG. 21 is a diagram exemplifying a configuration for obtaining anoutput which is equivalent to that of directional microphones by usingomni-directional microphones.

FIG. 22 is a diagram exemplifying a configuration for obtaining anoutput which is equivalent to that of directional speakers by usingomni-directional speakers.

FIG. 23 is a diagram showing an example in which sizes and shapes ofclosed surfaces differ in a measurement environment and a reproductionenvironment.

FIG. 24 is an illustrative diagram regarding a technique for convertinga transfer function when sizes and shapes of closed surfaces differ in ameasurement environment and reproduction environment.

FIG. 25 is an illustrative diagram regarding Measurement example 1 inwhich a moving object is used.

FIG. 26 is an illustrative diagram regarding Measurement example 2 inwhich a moving object is used.

FIG. 27 is an illustrative diagram regarding Measurement example 3 andMeasurement example 4 in which moving objects are used.

DESCRIPTION OF EMBODIMENTS

Hereinafter, embodiments relating to the present technology will bedescribed. Note that description will be provided in the followingorder.

<1. Overview of a reproduction technique realized in a signal processingsystem of an embodiment>

<2. Techniques for sound field reproduction>

<3. Measurement technique for sound field reproduction>

-   -   (3-1. Overview of a measurement technique)    -   (3-2. Regarding Measurement 1)    -   (3-3. Regarding Measurement 2)

<4. Sound field reproduction based on transfer functions>

-   -   (4-1. Sound field reproduction based on a first transfer        function)    -   (4-2. Sound field reproduction based on a second transfer        function)

<5. Configuration of a signal processing system>

<6. Modified examples>

-   -   (6-1. Regarding a closed surface)    -   (6-2. Regarding directivity)    -   (6-3. Resolution for a case in which sizes and shapes of closed        surfaces differ in a measurement environment and a reproduction        environment)    -   (6-4. Measurement technique using moving objects)    -   (6-5. Other modified examples)

1. Overview of an Operation Realized in a Signal Processing System of anEmbodiment

First, an overview of a reproduction technique that is realized in asignal processing system of the present embodiment will be describedusing FIG. 1.

In FIG. 1, a site A refers to a place in which a user 0 is to beimmersed, i.e., a place whose scene, spread of sound, and the like aredesired to be reproduced (a place to be reproduced).

In addition, a site B of the drawing refers to a place in which a sceneand spread of sound of a place to be reproduced are reproduced. Thissite B may be considered as, for example, a room of the user 0, or thelike.

In the side B, a plurality of speakers 2B which are disposed to surroundthe user 0 and a display device 3 that displays an image are installedas shown in the drawing.

A reproduction method that is realized in the signal processing systemof the present embodiment broadly includes displaying image informationwhich corresponds to the site A using the display device 3 which isdisposed in the site B, and reproducing a sound field 100 of the site Ausing the plurality of speakers 2B which are also disposed in the siteB.

By presenting the sound field 100 of the place together with an image ofthe place in which the user 0 wishes to be immersed to the user, a senseof immersion in the place can be further heightened for the user 0.

Note that, although the display device 3 has been exemplified to haveonly one surface as a display surface in FIG. 1, it is desirable todispose a display device 3 which has at least five display surfaces onthe front, left, right, top, and bottom as shown in FIG. 2 to heighten asense of immersion.

Here, in an actual system, a place to be reproduced as the site A can beselected from a plurality of candidates.

Designation of a place to be reproduced is performed by, for example,the user 0. For example, an arbitrary position is designated from a mapimage displayed on the display device 3 when a service provided in thepresent system is enjoyed. A place which corresponds to the position isspecified from position information of the designated position, and thenthe place is reproduced through an image and a sound as described above.

Here, the plurality of speakers 2B in the side B shown in FIG. 1 form aspace to surround the user 0.

As will be described later, a space which is formed by being surroundedby a plurality of microphones is also present in addition to the spacesurrounded by the plurality of speakers as described above in thepresent embodiment.

In the present specification, the interface of a space which is formedby being surrounded by a plurality of speakers or microphones asdescribed above, in other words, the interface of a space which isformed by connecting the plurality of speakers or microphones to eachother, is referred to as an “acoustic closed surface,” or simply as a“closed surface.”

As shown in FIG. 1, the acoustic closed surface that is formed by theplurality of speakers 2B in the site B is denoted by a closed surface1B.

Note that a microphone may be referred to simply as a mic in thefollowing description.

2. Techniques for Sound Field Reproduction

In the present embodiment, the sound field of the site A is reproducedin the site B as described above; however, as specific techniques of thesound field reproduction, two techniques shown in FIG. 3 (Technique 1and Technique 2) are mainly proposed in the present embodiment.

First, in Technique 1, the sound field 100, in which a sound produced bythe user 0 who is inside the closed surface 1B in the site B (forexample, a voice that the user 0 produces, an impact sound that isproduced when an object is dropped, a sound that is produced whenutensils touch during a meal, or the like) is sensed as if it echoes inthe site A, is reproduced by a plurality of speakers 2B. As will bedescribed later in detail, in order to realize Technique 1, soundsproduced by the user 0 are collected by a plurality of mics 5B which aredisposed to surround the user 0 and processed with a correspondingtransfer function, and thereby an acoustic signal for sound fieldreproduction (an acoustic signal to be output by the speakers 2B) isgenerated.

Here, as in general “echolocation,” an approximate space structure canbe understood empirically through auditory perception and recognition ofhow a sound one has produced oneself travels. Thus, according to thesound field reproduction of Technique 1 described above, the user 0 canperceive an impression of a space not only with an image but also withan acoustic factor that is based on a sound he or she has produced. As aresult, a sense of immersion can thereby be increased.

In addition, in Technique 2, the user 0 who is inside the closed surface1B is caused to perceive an environmental sound of the site A that is areproduction target including an echo of the sound in the site A.

Here, when the closed surface 1B is assumed to be inside the site A anda sound is set to be emitted from a given position outside the closedsurface 1B inside the site A, there are also cases in which the sound isaccompanied with a component of a reflective sound or a reverberantsound that is made via a structural object or an obstacle (such a sounddiffers depending on a material or structure of each object) present inthe site A, in addition to a component that directly reaches the closedsurface 1B. In Technique 2, an environmental sound of the site A as wellas such an echo sound is perceived.

By implementing Technique 2 together with Technique 1 described above, asense of immersion in the site A can be further heightened for the user0.

3. Measurement Techniques for Sound Field Reproduction 3-1. Overview ofMeasurement Techniques

FIG. 4 is a diagram for describing measurement techniques of transferfunctions for realizing sound field reproduction of an embodiment.

FIG. 4A schematically shows a plurality of mics 5A which are disposedinside the site A for measurement.

FIG. 4B schematically shows a measurement technique which corresponds toTechnique 1 (which is denoted as Measurement 1), and FIG. 4Cschematically shows a measurement technique which corresponds toTechnique 2 (which is denoted as Measurement 2). FIG. 4D schematicallyshows a technique for recording an environmental sound of the site Awithout change using the plurality of mics 5A which are disposed in thesite A.

Here, as shown in FIG. 4A, the interface of a space surrounded by theplurality of mics 5A which are disposed in the site A for measurement isreferred to as a closed surface 1A. It is ideal to set this closedsurface 1A to have the same size and shape as the closed surface 1B ofthe site B in which the user 0 is present. Moreover, it is desirable toset the mics 5A on the closed surface 1A to have the same conditions asthe speakers 2B on the closed surface 1B in number and positionalrelations.

First, in Measurement 1 shown in FIG. 4B, a transfer function to be usedwhen a sound that the user 0 who is inside the closed surface 1B hasproduced is processed in Technique 1 shown in FIG. 3 is measured.

Specifically in Measurement 1, a transfer function (impulse response)that indicates how a sound (a signal for measurement) outwardly emittedfrom the speakers 2A for measurement which are disposed in the site A isaffected by an echo in the site A and then reaches each of the mics 5Awhich are also disposed in the site A is measured.

Thus, by processing the signal (the sound produced by the user 0)collected by the mics 5B of the site B using the transfer function andoutputting the signal from the speakers 2B, the sound field 100 in whichthe sound produced by the user 0 is sensed as if it were echoing in thesite A can be constructed in the site B.

Note that, although the example of the drawing shows that measurement isperformed by disposing the speakers 2A for measurement inside the closedsurface 1A on which the plurality of mics 5A are disposed, the examplecorresponds to a case in which the plurality of speakers 2B forreproduction (on the closed surface 1B) are disposed inside theplurality of mics 5B which collect the sound produced by the user 0 (ona closed surface 4B) in the site B as a reproduction environment. Aswill be described later, the positional relation of the closed surface1B and the closed surface 4B can be reversed, and in such a case, thespeakers 2A for measurement are disposed outside the closed surface 1Ain Measurement 1 (refer to FIG. 5 and the like).

On the other hand, in Measurement 2 shown in FIG. 4C which correspondsto Technique 2 above, a transfer function to be used to process anacoustic signal that is based on a sound source that must be localizedat an arbitrary position outside the closed surface 1B is measured.

Here, Technique 2 described above can be realized by collectingenvironmental sounds of the site A using the plurality of mics 5A whichare disposed in the site A as shown in FIG. 4D and outputting a signalof the sound collection from each of the speakers 2B at positions whichcorrespond to those on the closed surface 1B in the simplest way(particularly when the speakers 2A disposed in the site B and the mics5A disposed in the site A are set to be the same in number andpositional relations).

In a case in which the environmental sounds which are simply recorded asdescribed above are set to flow, however, when two or more kinds ofenvironmental sounds are to be reproduced in one site, there is aproblem that recording must be performed a plurality of times in thatsite, or the like.

Thus, in the present embodiment, the concept of so-called “object-basedaudio” is employed to realize Technique 2.

Here, the “object-based audio” will be briefly described.

In order to realize sound quality and a sound field, a producergenerally provides a completed package of sound recorded on an existingmedium, for example, a compact disc (CD), a digital versatile disc (DVD)for each channel, and an acoustic signal of each channel accommodated ineach package is played to correspond to a channel of a correspondingspeaker.

In recent years, however, an idea of “object-based audio (or sound fieldexpression)” in which a sound field, sound quality, and the like that aproducer intends for people to hear are considered to have overlaps of aplurality of sets of “meta information” of an “acoustic stream signal ofeach sound source” and “the movement and position of the sound source”(which is referred to tentatively as an object), and the realization(rendering) according to a replay environment is entrusted to a replayenvironment side has appeared.

Using the object-based technique described above, a sound field andsound quality can be reproduced in accordance with features andperformance of a replay environment catering to the intentions of aproducer not only in the current state in which diversification ofreplay environments continues to progress but also when performance of areplay environment improves by leaps and bounds in the future.

Note that, as renderers to realize the “rendering” described above,there are various kinds of renderers according to replay environmentsfrom a renderer for a headphone to a sound field renderer using a numberof speakers for a 22.2 channel system or an immersive environment. Notethat, as the sound field renderer for an immersive environment, aplurality of techniques have been currently proposed, and varioustechniques such as wave field synthesis (WFS), a boundary surfacecontrol principle (BoSC), a technique obtained by simplifyingKirchhoff's integral theorem (JP 4775487B, JP 4674505B, and the like)and the like are known.

Measurement 2 shown in FIG. 4C is a measurement of a transfer functionfor causing the user 0 to perceive a sound in a way that, when theobject-based sound field reproduction technique described above isemployed, a sound source that is to be localized at an arbitraryposition outside the closed surface 1B is localized at the position andthe sound emitted from the position is perceived in the form of beingaffected by an echo in the site A.

Specifically, in Measurement 2, a transfer function which indicates howa sound (a signal for measurement), which is emitted from the speakers2A for measurement which are disposed at arbitrary positions outside theclosed surface 1A on which the plurality of mics 5A are disposed,reaches each of the mics 5A including influence of echo in the site A(impulse response) is measured.

Here, in the present embodiment, sound field reproduction using thetransfer functions which are measured in Measurement 1 and Measurement 2are set to be realized based on the following idea.

In other words, when a wave surface on which a sound that will reach theclosed surface 1B intersects the closed surface 1B is assumed, theplurality of speakers 2B perform replay so that the assumed wave surfaceis created inside the closed surface 1B.

3-2. Regarding Measurement 1

Hereinbelow, a specific example of the transfer function measurementtechnique of Measurement 1 will be described with reference to FIGS. 5to 7.

First, FIG. 5 shows the plurality of speakers 2B disposed in the site B(reproduction environment) in which the user 0 is present and the closedsurface 1B and the plurality of mics 5B and the closed surface 4B. Asunderstood from description above, the mics 5B disposed in the site Bare provided to collect sounds produced by the user 0 in real time.

In this case, the mics 5B must have inward directivity (in an inwarddirection of the closed surface 4B) to realize a system in which a soundproduced by the user 0 who is inside the closed surface 4B is affectedby echo in the site A and output from the speakers 2B. To this end,directional microphones are used as each of the mics 5B, and areinstalled so that directions of directivity thereof face the inwarddirection of the closed surface 4B.

In addition, the speakers 2B are installed so that directions of soundemission thereof face the inward direction of the closed surface 1B. Inother words, directional speakers are used as the speakers 2B, anddirectivity thereof is set to be inward.

Note that it is desirable to set a direction of directivity at that timeto be perpendicular to the closed surface.

Here, in description below, the number of speakers 2B which are disposedin the site B is set to N, and the number of mics 5B which are disposedin the site B is set to M. As shown in the drawing, the mics 5B are setto be disposed at each of positions of V1, V2, V3, . . . , and VM on theclosed surface 4B, and the speakers 2B are set to be disposed at each ofpositions of W1, W2, W3, . . . , and WN on the closed surface 1B.

Note that the mics 5B which are disposed at each of the positionsdescribed above may be denoted hereinbelow as mics V1, V2, V3, . . . ,and VM corresponding to the respective disposition positions thereof.Likewise, the speakers 2B may be denoted as speakers W1, W2, W3, . . . ,and WN corresponding to the respective disposition positions thereof.

FIGS. 6 and 7 are illustrative diagrams regarding the specific transferfunction measurement technique of Measurement 1.

In FIGS. 6 and 7, the plurality of speakers 2A, the closed surface 1A,the plurality of mics 5A and a closed surface 4A of the site A(measurement environment) are shown.

As seen from the drawings, the number of disposition positions of thespeakers 2A on the closed surface 4A of the site A is set to M indescription herein. The disposition positions are denoted by Q1, Q2, Q3,. . . , and QM as shown in the drawings.

In addition, the number of mics 5A which are disposed on the closedsurface 1A of the site A is set to N, and the disposition positionsthereof are denoted by R1, R2, R3, . . . , and RN as shown in thedrawings.

Note that the speakers 2A disposed in each of the positions describedabove may also be denoted as speakers Q1, Q2, Q3, . . . , and QMcorresponding to the respective disposition positions thereof and themics 5A may also be denoted as mics R1, R2, R3, . . . , and RNcorresponding to the respective disposition positions thereof in thesite A.

Here, with respect to the speakers 2A and the mics 5A of the site A, thespeakers 2A and the mics 5A must have outward directivity for thepurpose of obtaining a transfer function for causing the user 0 toperceive a sound that the user 0 has produced and that is affected by anecho in the site A. Due to this point, the speakers 2A are set to haveoutward directivity by using directional speakers, and the mics 5A arealso set to have outward directivity as shown in the drawing by usingdirectional microphones. It is also desirable in this case to set thedirection of the directivity to be perpendicular to the closed surface.

Here, for the purpose of convenience of the present description, theclosed surface 4A of the site A is set to have the same size and shapeas the closed surface 4B of the site B, and the positional relation ofthe respective speakers 2A on the closed surface 4A (an arrangementorder and a disposition interval of Q1, Q2, Q3, . . . , and QM) is setto be the same as the positional relation of the respective mics 5B onthe closed surface 4B (an arrangement order and a disposition intervalof V1, V2, V3, . . . , and VM).

In addition, the closed surface 1A of the site A is set to have the samesize and shape as the closed surface 1B of the site B, and thepositional relation of the respective mics 5A on the closed surface 1A(an arrangement order and a disposition interval of R1, R2, R3, . . . ,and RN) is set to be the same as the positional relation of therespective speakers 2B on the closed surface 1B (an arrangement orderand a disposition interval of W1, W2, W3, . . . , and WN).

Based on the premises described above, in Measurement 1, measurementsounds are sequentially output from the speakers 2A of each of thepositions (Q1 to QM) on the closed surface 4A, and respective transferfunctions from the speakers 2A which have output the measurement soundsto the positions of the respective mics 5A (R1 to RN) on the closedsurface 1A are sequentially obtained.

In FIG. 6, a state in which a measurement sound is output from thespeaker 2A at the position of Q1 and the measurement sound affected inreflection or the like in the site A is collected by the respective mics5A of R1 to RN is shown.

Based on the sound collection signal of the respective mics 5A obtainedas described above, N transfer functions from the speaker 2A at theposition of Q1 to the respective mics 5A of R1 to RN can be obtained.

In the present example herein, a sound that is based on a time stretchedpulse (TSP; swept sine also has the same meaning) signal is output asthe measurement sound described above, and an impulse response ismeasured from the sound collection signal. Data of the impulse responseis a transfer function that indicates how a sound output from a givenspeaker 2A is affected by an echo of the site A and then reaches a givenmic 5A.

In addition, in FIG. 7, a state in which a measurement sound is outputfrom the speaker 2A at the position of Q2 and the measurement soundwhich has been affected by reflection on the site A or the like iscollected by the respective mics 5A of R1 to RN is shown.

Based on the sound collection signal of the respective mics 5A obtainedin this way, impulse responses from the speaker 2A at the position of Q2to the respective mics 5A of R1 to RN are measured. Accordingly, Ntransfer functions from the speaker 2A at the position of Q2 to therespective mics 5A of R1 to RN can be obtained.

Measurement of the transfer functions based on the sound collectionsignal of the respective mics 5A of R1 to RN described above is executedto the position of QM by sequentially changing the speakers 2A whichoutput the measurement sound. Accordingly, as the transfer functions, atotal of M×N transfer functions including N transfer functions from thespeaker 2A of Q1 to each of the mics 5A of R1 to RN (which are denotedby QR₁₁ to QR_(1N)), N transfer functions from the speaker 2A of Q2 toeach of the mics 5A of R1 to RN (which are denoted by QR₂₁ to QR_(2N)),. . . , and N transfer functions from the speaker 2A of QM to each ofthe mics 5A of R1 to RN (which are denoted by QR_(M1) to QR_(MN)) can beobtained.

The M×N transfer functions can be expressed in a matrix as shown byExpression 1 below.

$\begin{matrix}\left\lbrack {{Math}\mspace{14mu} 1} \right\rbrack & \; \\\begin{pmatrix}{QR}_{11} & {QR}_{21} & \ldots & {QR}_{M\; 1} \\{QR}_{12} & \; & \; & {QR}_{M\; 2} \\\vdots & \; & \; & \vdots \\{QR}_{1N} & \ldots & \; & {QR}_{MN}\end{pmatrix} & \left\lbrack {{Expression}\mspace{14mu} 1} \right\rbrack\end{matrix}$

Note that, in obtaining the M×N transfer functions, the measurementsound may be sequentially output at each position of Q1 to QM, and thenumber of speakers 2A necessary for the output may be a minimum of 1. Inother words, by sequentially disposing one speaker 2A at each positionof Q1, Q2, Q3, . . . , and QM and causing the speaker to emit the sound,measurement necessary for obtaining the MxN transfer functions can beperformed.

Moving the speaker 2A for each measurement, however, is cumbersome, andthus in the present example, measurement of the M×N transfer functionsis set to be performed by disposing the speakers 2A at each position ofQ11 to QM and sequentially selecting speakers 2A which output themeasurement sound from the speakers 2A.

Here, a transfer function which is measured in Measurement 1 indicatinghow a sound produced by the user 0 is affected by an echo in the site Aand transferred is also referred to as a first transfer function.

FIG. 8 is an illustrative diagram regarding a system configuration forperforming measurement of a transfer function of Measurement 1 describedabove.

As shown in FIG. 8, M speakers 2A, N mics 5A, and a measurement device10 are provided to realize Measurement 1.

In the measurement device 10, M terminal units 11 (11-1 to 11-M) toconnect the M speakers 2A to the device and N terminal units 12 (12-1 to12-N) to connect the N mics 5A thereto are provided.

In addition, inside the measurement device 10, an A-D converter (ADC)and amplifying unit 13, a transfer function measurement unit 14, acontrol unit 15, a measurement signal output unit 16, a D-A converter(DAC) and amplifying unit 17, and a selector 18 are provided.

The measurement signal output unit 16 outputs a TSP signal as ameasurement signal to the DAC and amplifying unit 17 based on control ofthe control unit 15. The DAC and amplifying unit 17 D-A-converts andamplifies the input measurement signal and then outputs the signal tothe selector 18.

The selector 18 selects one terminal unit 11 (i.e., a speaker 2A) whichis instructed by the control unit 15 among the terminal units 11-1 to11-M and then outputs the measurement signal input from the DAC andamplifying unit 17 thereto.

The ADC and amplifying unit 13 amplifies and A-D-converts a soundcollection signal received from each mic 5A and input from each terminalunit 12 and then outputs the signal to the transfer function measurementunit 14.

The transfer function measurement unit 14 performs measurement of animpulse response (transfer function) based on the sound collectionsignal received from each mic 5A and input from the ADC and amplifyingunit 13 according to an instruction from the control unit 15.

The control unit 15 is configured as, for example, a micro-computerprovided with a central processing unit (CPU), a read only memory (ROM),and a random access memory (RAM), and performs overall control of themeasurement device 10 by executing processes according to programsstored in the ROM and the like.

Particularly, the control unit 15 of this case performs control over themeasurement signal output unit 16, the selector 18, and the transferfunction measurement unit 14 so that a measurement operation ofMeasurement 1 described above is realized. To be specific, the controlunit controls the measurement signal output unit 16 and the selector 18so that sound emission is sequentially performed by the respectivespeakers 2A of Q1, Q2, Q3, . . . , and QM, based on the measurementsignal, and controls measurement timings of the transfer functionmeasurement unit 14 so that measurement of the transfer functions isperformed based on the sound collection signal of each mic 5A insynchronization with timings of sound emission by each speaker 2A.

Accordingly, measurement of the M×N transfer functions described aboveis realized.

Here, in a practical perspective, an impulse response which isexpression of a time axis of a transfer function includes a direct soundor an early reflection sound in addition to a reverberant soundcomponent as shown in FIG. 9 due to directivity of the speakers andmics, which is also likely to be an obstructive factor in producing asense of presence depending on cases.

Note for the sake of clarification that a direct sound is a sound whichis emitted from a speaker 2A and directly reaches a mic 5A (withoutgoing through reflection on the site A).

Thus, in the present example, a measured impulse response is decomposedinto components of a direct sound, an early reflection sound, and areverberant sound on the time axis, and balance of the components ischanged and then synthesized again.

A configuration for the process is shown in FIG. 10.

Impulse response measurement data in the drawing is data of an impulseresponse (time axis waveform data) measured based on a sound collectionsignal by a mic 5A.

This impulse response measurement data is decomposed into a directsound, an early reflection sound, and a reverberant sound on the timeaxis by a signal component decomposition processing unit 19 as shown inthe drawing.

With regard to the direct sound and the early reflection sound,multiplication units 20 and 21 change balance of the sounds respectively(adjust levels). The components of the direct sound and the earlyreflection sound whose balance has been adjusted in this way and thecomponent of the reverberant sound obtained by the signal componentdecomposition processing unit 19 are added together by an addition unit22.

The transfer functions used in the present example are set to beobtained by performing component decomposition and balance adjustmentdescribed above on the measured (raw) impulse response data.

3-3. Regarding Measurement 2

FIG. 11 is an illustrative diagram regarding a specific technique formeasuring a transfer function of Measurement 2.

Measurement 2 described above involves localizing a sound source thatmust be localized at an arbitrary position outside the closed surface 1Bat the position and then measuring transfer functions (impulseresponses) each indicating how a sound emitted from a speaker 2A formeasurement which is disposed at an arbitrary position outside theclosed surface 1A so that a sound emitted from the position is set to beperceived by the user 0 in the form of an echo in the site A reacheseach of the mics 5A including influence of echo in the site A.

Specifically, in Measurement 2, the speaker 2A is disposed at theposition at which the sound source to be reproduced is desired to belocalized in the site A, a measurement sound output from the speaker 2Ais collected by each of the mics 5A on the closed surface 1A, and thenrespective impulse responses are measured. Accordingly, the sound sourcecan be localized at the position at which the speaker 2A are disposedand a group of transfer functions for causing a sound based on the soundsource to be perceived as a sound which is affected by an echo in thesite A can be obtained.

Here, when there are a plurality of positions at which the sound sourceis desired to be localized, the same measurement of the transferfunctions is performed at the plurality of positions in the site A. Forexample, after transfer functions are measured by performing soundemission of a measurement sound at the position of the speaker 2Aindicated by the solid line in FIG. 11 and sound collection by each ofthe mics 5A, transfer functions are measured by performing soundemission of a measurement sound at the position of the speaker 2Aindicated by the dashed line and sound collection by each of the mics5A.

When there are a plurality of “positions at which the sound source isdesired to be localized” as described above, measurement of transferfunctions is performed for each of the “positions at which the soundsource is desired to be localized.”

Here, a transfer function which is measured in Measurement 2 indicatinghow a sound emitted from an arbitrary position outside the closedsurface 1A reaches the closed surface 1A side also including influenceof an echo in the site A is also referred to hereinafter as a secondtransfer function.

Note for the sake of clarification that, in Measurement 2, a transferfunction that also can express directivity of a sound source can beobtained according to a direction in which a speaker 2A which emits ameasurement sound faces the closed surface 1A.

Measurement 2 described above can also be realized using the measurementdevice 10 shown in FIG. 8 above.

In this case, however, the number of connected speakers 2A is the numberaccording to the number of positions at which the sound source isdesired to be localized. Specifically, when speakers 2A are connected inthe same number as positions at which the sound source is desired to belocalized, the control unit 15 controls the selector 18 to sequentiallyselect the speakers 2A which will output measurement sounds and controlsthe transfer function measurement unit 14 to execute a transfer functionmeasurement process in synchronization with the output timings of themeasurement sounds.

4. Sound Field Reproduction Based on Transfer Functions 4-1. Sound FieldReproduction Based on a First Transfer Function

As described above, the number of the first transfer functions is atotal of MxN including N transfer functions from the speaker 2A of Q1 toeach of the mics 5A of R1 to RN (QR₁₁ to QR_(1N)), N transfer functionsfrom the speaker 2A of Q2 to each of the mics 5A of R1 to RN (QR₂₁ toQR_(2N)), . . . , and N transfer functions from the speaker 2A of QM toeach of the mics 5A of R1 to RN (QR_(M1) to QR_(MN)).

Here, it is ascertained that, in the site B (reproduction environment)shown in FIG. 5, the number of speakers 2B which are disposed on theclosed surface 1B is N, and thus the number of channels of acousticsignals that must be finally obtained is N.

When an acoustic signal that must be output from the position of W1 isconsidered on the above premise, for example, a sound which is emittedfrom the user 0 in each of directions of V1 to VM on the closed surface4B, affected by an echo in the site A, and returns to the position of W1must be output from the position of W1.

In other words, when an acoustic signal to be output from the speaker 2Bat the position of W1 is set to a signal W₁, the signal W₁ can beexpressed as follows.

W ₁ =V ₁ ×QR ₁₁ +V ₂ ×QR ₂₁ +V ₃ ×QR ₃₁ + . . . +V _(M) ×QR _(M1)

In the above formula, however, V₁ to V_(M) are set to be soundcollection signals of mics V1 to VM.

As the signal W₁ above, M signals obtained by processing respectivesounds output in each of the directions of V1 to VM (Q1 to QM) with onecorresponding transfer function among transfer functions (QR₁₁, QR₂₁, .. . , and QR_(M1)) of W1 (R1) are summated.

Likewise for the positions of W2 and W3, sounds which are emitted fromthe user 0 in each of the directions of V1 to VM, affected by an echo inthe site A, and then return to the positions of W2 and W3 must beoutput, and signals W₂ and W₃ which must be output from the speakers 2Bat the positions of W2 and W3 can be expressed as follows.

W ₂ =V ₁ ×QR ₁₂ +V ₂ ×QR ₂₂ +V ₃ ×QR ₃₂ + . . . +V _(M) ×QR _(M2)

W ₃ =V ₁ ×QR ₁₃ +V ₂ ×QR ₂₃ +V ₃ ×QR ₃₃ + . . . +V _(M) ×QR _(M3)

In other words, as the signal W₂, M signals which are obtained byprocessing the respective sounds output in each of the directions of V1to VM (Q1 to QM) with one corresponding transfer function among transferfunctions (QR₁₂, QR₂₂, . . . , and QR_(M2)) of W2 (R2) are summated, andas the signal W₃, M signals which are obtained by processing therespective sounds output in each of the directions of V1 to VM (Q1 toQM) with one corresponding transfer function among transfer functions(QR₁₃, QR₂₃, . . . , and QR_(M3)) of W3 (R3) are summated.

The same applies when obtaining other signals W₄ to W_(N).

Based on the above description, the following Expression 2 is obtainedwhen an arithmetic expression of the signals W1 to WN is expressed as amatrix.

$\begin{matrix}\left\lbrack {{Math}\mspace{14mu} 2} \right\rbrack & \; \\{\begin{pmatrix}W_{1} \\W_{2} \\\vdots \\W_{N}\end{pmatrix} = {\begin{pmatrix}{QR}_{11} & {QR}_{21} & \ldots & {QR}_{M\; 1} \\{QR}_{12} & \; & \; & {QR}_{M\; 2} \\\vdots & \; & \; & \vdots \\{QR}_{1N} & \ldots & \; & {QR}_{MN}\end{pmatrix}\begin{pmatrix}V_{1} \\V_{2} \\\vdots \\V_{M}\end{pmatrix}}} & \left\lbrack {{Expression}\mspace{14mu} 2} \right\rbrack\end{matrix}$

When the arithmetic operation expressed by Expression 2 is performed,the signals W₁ to W_(N) which must be output from each of the speakers2B of W1 to WN to cause the user 0 to perceive a sound field that issensed as if a sound produced by the user 0 in the closed surface 1Bwere echoing in the site A can be obtained.

4-2. Sound Field Reproduction Based on a Second Transfer Function

As understood from above description, Technique 2 that uses the secondtransfer function causes the user 0 to perceive an environmental soundof the site A also including echoes in the site A, but unlike Technique1, a process on a sound collection signal of the mics 5B using atransfer function is not performed.

In Technique 2, a process is performed on a predetermined sound sourcethat is recorded in advance using a second transfer function, not on asound collection signal of the mics 5B.

Specifically, in Technique 2, by performing a process on a predeterminedsound source using N second transfer functions which are measured forthe disposition position of one speaker 2A in Measurement 2 describedabove, signals which must be output from each speaker 2B disposed in thesite B as a reproduction environment are obtained.

As a simplest example, when one given sound source is localized at onegiven position, for example, N signals are obtained by processingacoustic signals that are based on the sound source with the secondtransfer functions which are measured based on sound collection signalsof each position of R1 to RN, and the signals may be output from onecorresponding speaker 2B among the speakers 2B of W1 to WN in thereproduction environment.

Alternatively, when a sound source A is localized at a position a and asound source B is localized at a position b, N signals are obtained forthe sound source A by processing acoustic signals which are based on thesound source A with N second transfer functions which have been obtainedin measurement at the position a, and N signals are obtained for thesound source B by processing acoustic signals which are based on thesound source B with N second transfer functions which have been obtainedin measurement at the position b. Then, the N signals obtained on eachof the sound source A and the sound source B sides are added to each ofthe positions (W1 to WN) of the speakers 2B, and thereby signals whichmust be output from the speakers 2B at each of the positions of W1 to WNare obtained.

5. Configuration of a Signal Processing System

FIG. 12 is a diagram for describing a configuration of a signalprocessing system for realizing a signal processing technique as anembodiment described above.

As shown in FIG. 12, the signal processing system according to thepresent embodiment is configured to have at least M mics 5B, a signalprocessing device 30, N speakers 2B, a display device 3, and a serverdevice 25.

First, as a premise, data regarding map information that must bedisplayed for designation of position information by the user 0, imagedata that must be displayed corresponding to a place specified fromdesignated position information, information of first transfer functionsto be used in sound field reproduction of Technique 1, and object-baseddata to be used in sound field reproduction of Technique 2 are assumedto be stored in the server device 25.

Specifically, the server device 25 stores map data 25A, image data 25B,first transfer function information 25C, correspondence relationinformation 25D, and object-based data 25E.

The map data 25A is data supplied for display of the map information(map images). In addition, the image data 25B is image data for placeswhich are reproduction targets, and for example, image data obtained byphotographing figures of the places for each reproduction target place.

In addition, the first transfer function information 25C representsinformation of first transfer functions measured for each ofreproduction target places in Measurement 1 described above.

In addition, the object-based data 25E comprehensively representsobject-based data used in sound field reproduction of Technique 2. Asthis object-based data 25E, second transfer function information 25E1which is information of second transfer functions measured for each ofreproduction target places in Measurement 2 above and object-separatedsound source 25E2 are included.

The object-separated sound source 25E2 is a sound source present in areproduction target place, and it may be considered as, for example, anecessary sound source extracted from a recorded signal at areproduction target place. As a process of extracting this sound source,noise removal, reverberation suppression, or the like is performed onthe recorded signal. Accordingly, sound source data which has afavorable S/N (noise-to-noise ratio) and also a suppressed reverberationfeeling can be obtained. In other words, sound source data proper forobject-based sound field reproduction can be obtained.

The correspondence relation information 25D is information to display animage of a place according to designated position information and torealize operations of the present system of realizing a sound fieldcorresponding to the place, and specifically, information in which aplace, an image to be displayed corresponding to the place, a firsttransfer function to be used in sound field reproduction of Technique 1corresponding to the place, an object-separated sound source (objectsound source in the drawing) to be used in sound field reproduction ofTechnique 2 corresponding to the place, and second transfer functionsare associated together as shown in FIG. 13.

In the present example, the image data, the first transfer functions,the second transfer functions, and the object-separated sound sourcesare managed with respective IDs.

In the correspondence relation information 25D, IDs for the image data,first transfer functions, second transfer functions, andobject-separated sound sources that must be used corresponding to theplaces are described, and with the IDs, actual data to be used inpractice can be specified from actual data stored as the image data 25B,the first transfer function information 25C, the second transferfunction information 25E1, and the object-separated sound source 25E2.

Note that, in the correspondence relation information 25D shown in thedrawing, with regard to data to be used in sound field reproduction ofTechnique 2, two each of object-separated sound sources and secondtransfer functions are associated with one place; however, thiscorresponds to a technique for localizing two respective sound sourcesat different positions in one place.

Returning to FIG. 12, the signal processing device 30 is provided with acommunication unit 44, and can perform data communication with theserver device 25 using the communication unit 44 via a network 26, forexample, the Internet.

The signal processing device 30 is provided with M terminal units 31(31-1 to 31-M) to connect M mics 5B to the device and N terminal units39 (39-1 to 39-N) to connect N speakers 2B thereto.

In addition, the signal processing device 30 is also provided with aterminal unit 43 to connect the display device 3 also shown in FIG. 1above.

Further, inside the signal processing device 30, an ADC and amplifyingunit 32, addition units 33-1 to 33-M, howling control and echocancellation units 34 and 36, a matrix convolution unit 35, additionunits 37-1 to 37-N, a DAC and amplifying unit 38, a control unit 40, anoperation unit 41, a display control unit 42, the communication unit 44,a memory 45, a reference sound replay unit 46, and a bus 48 areprovided.

Here, each of the matrix convolution unit 35, the control unit 40, thedisplay control unit 42, the communication unit 44, the memory 45, thereference sound replay unit 46, and a rendering unit 47 is connected tothe bus 48, and thus they can perform data communication with each othervia the bus 48.

Inside the signal processing device 30, sound collection signals fromeach of the mics 5B input through the terminal units 31-1 to 31-M areA-D-converted and amplified by the ADC and amplifying unit 32 for eachchannel.

The sound collection signals from each of the mics 5B A-D-converted andamplified by the ADC and amplifying unit 32 for each channel are inputinto respective addition units 33 of corresponding channels among theaddition units 33-1 to 33-M.

The addition units 33-1 to 33-M add acoustic signals as reference soundswhich have been replayed by the reference sound replay unit 46 to thesound collection signals of each of the channels of V1 to VM, which willbe described again later.

The sound collection signals that pass through the addition units 33-1to 33-M are supplied to the howling control and echo cancellation unit34.

The howling control and echo cancellation unit 34 is provided to preventhowling caused by feedback, along with the howling control and echocancellation unit 36 which is provided in the later stage of the matrixconvolution unit 35. The howling control and echo cancellation units 34and 36 are connected to each other so as to perform linked processes asshown in the drawing.

Here, in the present system, the mics 5B and speakers 2B are disposed ina reproduction environment; however, there is concern that an excessiveoscillation operation occurs due to an action of both components in somecases because the mics 5B and the speakers 2B are disposed relativelyadjacent to each other. Thus, the present example attempts to preventoccurrence of such an excessive oscillation operation by providing thehowling control and echo cancellation units 34 and 36.

The matrix convolution unit 35 performs a process on each of signals ofwhich sounds are collected by each of the mics 5B and input via thehowling control and echo cancellation unit 34 based on the firsttransfer functions, and thereby generates signals that must be outputfrom each of the speakers 2B to realize sound field reproduction asTechnique 1.

Specifically, the matrix convolution unit 35 performs the process on Msignals (V₁ to V_(M)) input from the howling control and echocancellation unit 34 based on the first transfer functions (QR₁₁ toQR_(MN)) instructed by the control unit 40, and then generates N signalsthat must be output from each of the speakers 2B to realize sound fieldreproduction as Technique 1.

Herein, FIG. 14 shows a specific internal configuration example of thematrix convolution unit 35.

Note that this drawing shows a configuration example in which finiteimpulse response (FIR) digital filters that have expressions of firsttransfer functions on a time axis (impulse responses) as coefficientsare used.

In addition, in this drawing, the signals V₁ to V_(M) are set toindicate signals input to the matrix convolution unit 35 via the howlingcontrol and echo cancellation unit 34 as also understood from FIG. 12above, and the signals W₁ to W_(N) are set to indicate signals inputfrom the matrix convolution unit 35 to the howling control and echocancellation unit 36.

First, as a premise, filters 50 of this case are assumed to be FIRdigital filters.

The matrix convolution unit 35 of this case is provided with N filters50 (each of which ends with 1 to N) for each of the signals V₁ to V_(M).In this drawing, filters 50-11 to 50-1N to which the signal V₁ is input,filters 50-21 to 50-2N to which the signal V₂ is input, and filters50-M1 to 50-MN to which the signal V_(M) is input are shown asrepresentative examples.

For the filters 50-11 to 50-1N to which the signal V₁ is input, a filtercoefficient based on the first transfer functions QR₁₁ to QR_(1N)corresponding to the position of V1 (Q1) is set.

In addition, for the filters 50-21 to 50-2N to which the signal V₂ isinput, a filter coefficient based on the first transfer functions QR₂₁to QR_(2N) corresponding to the position of V2 (Q2) is set, and for thefilters 50-M1 to 50-MN to which the signal V_(M) is input, a filtercoefficient based on the first transfer functions QR_(M1) to QR_(MN)corresponding to the position of VM (QM) is set.

Although not illustrated in the drawing, filter coefficients based on Nfirst transfer functions corresponding to the positions of the mics 5Bwhich collect sounds of the signals are also set for N filters 50 towhich other signals (V₃ to V_(M−1)) are input.

In addition, the matrix convolution unit 35 is provided with N additionunits 51 (51-1 to 51-N). The addition units 51-1 to 51-N receive inputsof signals among signals which have undergone a filter process based onthe first transfer function corresponding to each filter 50, and thenperform addition to obtain signals W₁ to W_(N).

Specifically, signals obtained from filters 50 which end with 1 amongthe filters 50 are input to the addition unit 51-1, and signals obtainedfrom filters 50 which end with 2 are input to the addition unit 51-2. Inaddition, signals obtained from filters 50 which end with N are input tothe addition unit 51-N.

In other words, M signals processed with the first transfer functions ofthe positions according to the numeric values at their ends among thepositions of W1 to WN (R1 to RN) are input to the addition units 51-1 to51-N.

The addition units 51-1 to 51-N add (combine) M signals input asdescribed above.

With the configuration described above, the arithmetic operations of thesignals W₁ to W_(N) shown in Expression 2 above can be realized.

Note that, although the example of the time axis arithmetic operationhas been shown herein, a convolution operation may be performed as atime axis arithmetic operation. Alternatively, in the case of afrequency operation, multiplication using transfer functions may beperformed.

Description will be provided returning to FIG. 12.

The N signals (W₁ to W_(N))obtained in the matrix convolution unit 35undergoes a process by the howling control and echo cancellation unit 36for each channel, and then are respectively input to the addition units37 of corresponding channels among the addition units 37-1 to 37-N.

The addition units 37-1 to 37-N add a signal input from the renderingunit 47 to the signals input from the howling control and echocancellation unit 36, and then output the signals to the DAC andamplifying unit 38.

The DAC and amplifying unit 38 performs D-A conversion and amplificationon the output signals from the addition units 37-1 to 37-N for eachchannel, and then outputs the signals to the terminal units 39-1 to39-N. Accordingly, the speakers 2B of W1 to WN of each channel emitsounds according to acoustic signals of corresponding channels.

The rendering unit 47 is provided to perform a signal process forrealizing sound field reproduction as Technique 2.

The rendering unit 47 performs a process on an object-separated soundsource transmitted from the server device 25 via the network 26 based onsecond transfer functions which are also transmitted from the serverdevice 25 via the network 26 according to an instruction of the controlunit 40, and thereby generates acoustic signals of N channels that mustbe output from each of the speakers 2B to cause the user 0 to perceivean environmental sound of the site A also including an echo in the siteA.

Note that, as understood from the above description, when a plurality ofsound sources are localized at different positions, the rendering unit47 adds the acoustic signals of N channels obtained by processing eachof the sound sources with the corresponding (N) second transferfunctions for each channel, and thereby obtains acoustic signals of Nchannels that must be output from each of the speakers 2B.

The display control unit 42 performs display control of the displaydevice 3 which is connected via the terminal unit 43. Specifically, thedisplay control unit 42 of this case causes the display device 3 todisplay images based on map data transmitted from the server device 25via the network 26 and images based on image data also transmitted fromthe server device 25 via the network 26 according to an instruction ofthe control unit 40.

The memory 45 stores various kinds of data. Particularly, the memory 45of this case is used to temporarily accumulate (buffer) data transmittedfrom the server device 25.

The control unit 40 is configured by a micro-computer provided with, forexample, a CPU, a ROM, a RAM, and the like, and performs overall controlover the signal processing device 30 by executing processes according toprograms stored in, for example, the ROM and the like.

The operation unit 41 is connected to the control unit 40, and thecontrol unit 40 realizes operations according to operations by the user0 by accepting operation information according to operations by the user0 performed on the operation unit 41 and executing processes accordingto the operation information.

Particularly, the control unit 40 of this case realizes a reproductionoperation as an embodiment by executing the process shown next in FIG.15.

FIG. 15 is a flowchart showing the content of a process to be executedin the present system to realize a reproduction operation as anembodiment.

Note that, in FIG. 15, the process for the signal processing device isexecuted by the control unit 40 provided in the signal processing device30, and the process for the server device is executed by a control unit(not illustrated) provided in the server device 25.

In addition, when the processes shown in the drawing are to be started,the devices are assumed to be in the state in which necessary positioninformation has already been designated based on an operation input bythe user 0 through the operation unit 41.

In FIG. 15, the control unit 40 of the signal processing device 30performs a process for transmitting designated position information tothe server device 25 in Step S101. In other words, the designatedposition information is transmitted by the communication unit 44 to theserver device 25 via the network 26.

The control unit of the server device 25 specifies a place correspondingto the designated position information in Step S201 according to thereception of the designated position information transmitted from thesignal processing device 30 side. The specification of the place isperformed with reference to, for example, correspondence relationinformation between predetermined position information and the place.

After the place is specified in Step S201, the control unit of theserver device 25 transmits image data, a first transfer function, asecond transfer function, and an object-separated sound source accordingto the specified place to the signal processing device 30 in Step S202.

Specifically, among imaged data, the first transfer function, the secondtransfer function, and the object-separated sound source which arestored respectively as the image data 25B, the first transfer functioninformation 25C, the second transfer function information 25E1, and theobject-separated sound source 25E2 based on the correspondence relationinformation 25D, the image data, the first transfer function, the secondtransfer function, and the object-separated sound source correspondingto the specified place are transmitted to the signal processing device30.

On the signal processing device 30 side, execution control of a processusing image display and the first and second transfer functions isperformed in Step S102 according to the transmission of the image data,the first transfer function, the second transfer function, and theobject-separated sound source from the server device 25 as describedabove. In other words, with respect to the image data transmitted fromthe server device 25 side, an instruction is given to the displaycontrol unit 42 to cause the display device 3 to display the image data.In addition, with respect to the first transfer function transmittedfrom the server device 25 side, an instruction is given to the matrixconvolution unit 35 to execute an arithmetic operation of Expression 2above based on the first transfer function. In addition, with respect tothe second transfer function and the object-separated sound sourcetransmitted from the server device 25 side, an instruction is given tothe rendering unit 47 to cause the rendering unit 47 to execute arendering process based on the second transfer function and theobject-separated sound source.

Accordingly, an image corresponding to the place specified from thedesignated position information can be presented to the user 0, a soundfield in which a sound emitted by the user 0 is sensed as if it wereechoing in the place specified from the designated position informationcan be provided, and the user 0 can be caused to perceive anenvironmental sound of the place including an echo sound of the place.

According to the signal processing system of the present embodimentdescribed above, a sense of immersion for the user can be heightenedmore than when only image information is presented.

Here, as covered above, the reference sound replay unit 46 is providedto output a reference sound in the present embodiment.

As this reference sound, sound data prepared in advance (which may use acollected sound as a source, or may be an artificial sound) is used,rather than a sound recorded in the site B in real time.

It is echolocation like in Technique 1 according to an intention, and itis possible to present the kind of the space of the places usingacoustic information by continuously outputting the same sound sourcematerial even when reproduction target places are different. In thiscase, it is possible to understand structures of the places, or the likebased on the acoustic information with higher reproducibility than whenonly sounds that are collected in real time are simply processed with afirst transfer function and then output.

As shown in FIG. 12, the reference sound replayed by the reference soundreplay unit 46 is added to each of sound collection signals (which haveundergone A-D conversion and amplification by the ADC and amplifyingunit 32) by the mics 5B by the addition units 33-1 to 33-M.

The matrix convolution unit 35 performs an arithmetic operation usingExpression 2 above based on the sound collection signals (V₁ to V_(M))of the respective channels to which the reference sound has been addedas described above. Signals of N channels (W₁ to W_(N)) obtained in theprocess by the matrix convolution unit 35 in this way go through thehowling control and echo cancellation unit 36, the addition units 37,the DAC and amplifying unit 38, and the terminal units 39, and then areoutput from the corresponding speakers 2B.

Accordingly, an effect of echolocation is heightened, and thereby asense of immersion for the user 0 can further increase.

Here, in the above description, the case in which the rendering processfor realizing Technique 2 is executed by the signal processing device 30placed on the reproduction environment side on which the user 0 ispresent has been exemplified; however, the rendering process can also beset to be performed in a necessary server device on the network 26 (inother words, performed on a so-called cloud) which is isolated from thereproduction environment.

FIG. 16 is a diagram showing a system configuration example in which therendering process of Technique 2 is set to be performed on a cloud.

Note that this drawing shows the configuration example in which therendering process is performed in the server device 25; however, aserver device that stores data such as map data 25A, the first transferfunction information 25C, and the like may be formed in a separate bodyfrom the server device which executes the rendering process.

As shown in the drawing, the server device 25 is provided with arendering unit 52 in this case. In addition, the signal processingdevice 30 is provided with an output control unit 53 instead of therendering unit 47 in this case.

According to specification of the place based on designated positioninformation, the server device 25 of this case performs a renderingprocess in the rendering unit 52 using the second transfer function andthe object-separated sound source corresponding to the place.

In this case, the server device 25 transmits acoustic signals (of Nchannels) that has undergone the rendering process by the rendering unit52 to the signal processing device 30.

The control unit 40 of the signal processing device 30 of this casecauses the output control unit 53 to output the respective acousticsignals of N channels transmitted from the server device 25 as describedabove to the addition units 37 of the corresponding cannels out of theaddition units 37-1 to 37-N.

When the rendering process is set to be executed on a cloud in this way,a processing burden on the signal processing device 30 can beeffectively lightened.

Note that whether the rendering process is to be performed on the signalprocessing device 30 side (local side) or on the cloud may beappropriately switched according to the speed of the network, a ratio ofprocessing capabilities between the cloud and local side, and the like.

In addition, although all of the first transfer function information 25Cand the object-based data 25E is set to be stored in the server device25 in FIG. 12 above, at least some of the information may be stored onthe signal processing device 30 side. In this case, in the signalprocessing device 30, information of the first transfer function, theobject-separated sound source, and the second transfer function of theplace specified from the designated position information is acquiredfrom a storage unit inside the signal processing device 30 and used inprocesses.

6. Modified Examples 6-1. Regarding a Closed Surface

Here, although not particularly mentioned in the above description,considering the sound field reproduction techniques of the embodimentsdescribed above, the closed surface 1B on which the plurality ofspeakers 2B are disposed in the reproduction environment and the closedsurface 4B on which the plurality of mics 5B are also disposed in thereproduction environment may be set to surround the user 0, and theclosed surface 1B and the closed surface 4B may intersect each other.

FIG. 17 is a diagram exemplifying relations between the closed surface1B and the closed surface 4B.

FIG. 17A is an example in which the closed surface 1B is set to surroundthe user 0 and the closed surface 1B is set inside the closed surface4B. FIG. 17B is an example in which the closed surface 1B is in closerproximity to the closed surface 4B in the example shown in FIG. 17A. Inaddition, FIG. 17C is an example in which both the closed surface 1B andthe closed surface 4B are set to surround the user 0, but a part of theclosed surface 1B protrudes from the closed surface 4A.

In addition, in the example shown in FIG. 17D, only the closed surface4B is set to surround the user 0 in the example of FIG. 17C. Inaddition, in the example shown in FIG. 17E, the closed surface 1B is setinside the closed surface 4B and the closed surface 4B is set tosurround the user 0, but the closed surface 1B is not set to surroundthe user 0.

Among the examples of FIGS. 17A to 17E, those to which the presenttechnology is properly applied are those shown in FIGS. 17A to 17C.

The closed surface 1B and the closed surface 4B may be set to be formedwith at least one region in which their parts overlap, and if the useris present in the overlapping region, the present technology is properlyapplied.

In addition, a shape of a closed surface formed by mics and speakers isnot particularly limited as long as it is a shape that can surround theuser 0, and for example, a shape of an elliptic closed surface 1B-1, acylindrical closed curved shape 1B-2, or a polygonal closed surface 1B-3as shown in FIG. 18 may be possible.

Note that the shapes of the closed surface 1B formed by the plurality ofspeakers 2B are exemplified in FIG. 18, and they are also applied toshapes of the closed surface 4B formed by the plurality of mics 5B.

Here, with respect to an ideal disposition interval of the speakers andmics on a closed surface, it is desirable to arrange them at an intervalof half a wavelength of a target frequency or lower. However, if this isfully realized, there is also a possibility of the number of speakersand mics to be installed becoming enormous.

In reality, it is desirable to set a realistic number of speakers andmics at which the effect can be experienced.

In addition, the case in which the closed surface 1B is inside theclosed surface 4B and the closed surface 4B has a larger size than theclosed surface 1B has been exemplified in the above description;however, the closed surface 1B may have a larger size than the closedsurface 4B.

As an example, FIG. 19 shows a case in which the closed surface 4B isset inside the closed surface 1B.

When the closed surface 4B is disposed inside the closed surface 1B likethis, a closed surface 4A on which speakers 2A are disposed is setinside a closed surface 1A on which mics 5A are disposed in the site Aas a measurement environment as shown in FIG. 20.

6-2. Regarding Directivity

With respect to the mics 5A and 5B, the case in which the directionalmics are used has been exemplified in the above description; however, itis not necessary for the mics 5A and 5B to have directivity as singledevices, and omni-directional mics can also be used.

In such a case, by forming a so-called mic array using a plurality ofomni-directional mics, an output equivalent to that of directional micscan be obtained.

FIG. 21 shows an example of a configuration for obtaining an outputwhich is equivalent to that of directional mics by usingomni-directional mics 5A or 5B.

The mics 5A or 5B are set to be disposed at the edge from number 1 tonumber 5 in the order shown in the drawing. In addition, together withthe number 1 to number 5 mics 5A or 5B, two sets of delay circuits, eachset having three circuits, are set to be provided in this case (a set ofthe delay circuits 54-11 to 54-13 and another set of the delay circuits54-21 to 54-23). Outputs from the delay circuits 54-11 to 54-13 areadded by an addition unit 55-1 and outputs from the delay circuits 54-21to 54-23 are added by an addition unit 55-2 and then output as shown inthe drawing.

An output of the number 1 mic 5A or 5B, an output of the number 2 mic 5Aor 5B, and an output of the number 3 mic 5A or 5B are input to the delaycircuit 54-11, the delay circuit 54-12, and the delay circuit 54-13,respectively. In addition, the output of the number 2 mic 5A or 5B, theoutput of the number 3 mic 5A or 5B, and an output of the number 4 mic5A or 5B are input to the delay circuit 54-21, the delay circuit 54-22,and the delay circuit 54-23, respectively.

In the configuration described above, for example, by appropriatelysetting a delay amount of the delay circuits 54-11 to 54-13, a soundcollection signal of a predetermined first direction which can berealized with sound collection signals of the number 1 to number 3 mics5A or 5B can be obtained as an output of the addition unit 55-1.Likewise, by appropriately setting a delay amount of the delay circuits54-21 to 54-23, a sound collection signal of a predetermined seconddirection which can be realized with sound collection signals of thenumber 2 to number 4 mics 5A or 5B can be obtained as an output of theaddition unit 55-2.

By applying appropriate delays to the sound collection signals of theomni-directional mics which are arrayed in plural and adding (combining)them together as described above, a mic array can be formed and anoutput equivalent to that of directional mics can be obtained.

Note that, although the sound collection signals from the three mics areset to be delayed and added to realize one direction of directivity inthe example of FIG. 21, directivity can be expressed when soundcollection signals from at least two or more mics are delayed and added.

In addition, for speakers, by forming an array speaker in the samemanner, the function of directivity can be realized even when devicesthemselves are omni-directional.

FIG. 22 shows an example of a configuration for obtaining an outputwhich is equivalent to that of directional speakers by usingomni-directional speakers 2A or 2B.

Speakers 2A or 2B are disposed at the edge from number 1 to number 5 inthe order shown in the drawing in this case as well. In addition,together with the number 1 to number 5 speakers 2A or 2B, two sets ofdelay circuits, each set having three circuits, are provided (a set ofthe delay circuits 56-11 to 56-13 and another set of the delay circuits56-21 to 56-23). Acoustic signals that must be output in a firstdirection are given to the delay circuits 56-11 to 56-13, and acousticsignals that must be output in a second direction are given to the delaycircuits 56-21 to 56-23 as shown in the drawing.

An output of the delay circuit 56-11 is given to the number 1 speaker 2Aor 2B. In addition, an output of the delay circuit 56-12 and an outputof the delay circuit 56-21 are added by an addition unit 57-1 and givento the number 2 speaker 2A or 2B. In addition, an output of the delaycircuit 56-13 and an output of the delay circuit 56-22 are added by anaddition unit 57-2 and given to the number 3 speaker 2A or 2B. Inaddition, an output of the delay circuit 56-23 is given to the number 4speaker 2A or 2B.

In the configuration described above, for example, by appropriatelysetting a delay amount of the delay circuits 56-11 to 56-13, an outputsound in the predetermined first direction can be obtained as outputsounds of the number 1 to number 3 speakers 2A or 2B. Likewise, byappropriately setting a delay amount of the delay circuits 56-21 to56-23, an output sound in the predetermined second direction can beobtained as output sounds of the number 2 to number 4 speakers 2A or 2B.

Note for the sake of clarification that, when an application in whichmeasurement sounds are output in each of directions (Q1 to QM) in orderin a measurement environment is considered, an acoustic signal that mustbe output in the first direction and an acoustic signal that must beoutput in the second direction are not given to the delay circuits 56 atthe same time, but given at deviated timings. When the measurementsounds are output in the first direction, for example, measurementsignals are given only to the delay circuits 56-11 to 56-13, rather thanthe delay circuits 56-21 to 56-23, and on the other hand, when themeasurement sounds are output in the second direction, measurementsignals are given only to the delay circuits 56-21 to 56-23, rather thanthe delay circuits 56-11 to 56-13.

By applying appropriate delays to the acoustic signals given to theomni-directional speakers which are arrayed in plural as describedabove, a speaker array can be formed and an action that is equivalent tothat of directional speakers can be obtained.

6-3. Resolution for a Case in Which Sizes and Shapes of Closed SurfacesDiffer in a Measurement Environment and a Reproduction Environment

For the sake of convenience in the above description, the case in whichthe set of the closed surfaces 1B and 1A and the set of the closedsurfaces 4B and 4A respectively have the same size and shape in therelation of the site B and the site A has been exemplified; however, itis difficult in reality to precisely match positions of speakers andmics in a measurement environment with disposition of mics and speakersin a reproduction environment.

FIG. 23 shows an example of this.

In the site B shown in FIG. 23, the same closed surface 1B and closedsurface 4B as shown in FIG. 5 above are assumed to be set.

In this case, ideally in the site A serving as a measurementenvironment, the closed surface 1A which has the same size and shape asthe closed surface 1B and the closed surface 4A which has the same sizeand shape as the closed surface 4B must be set in the same positionalrelation as that of the closed surface 1B and the closed surface 4B, butthis is very difficult in reality.

In the example of this drawing, a closed surface 1A′ which has adifferent size and shape from the closed surface 1A and a closed surface4A′ which has a different size and shape from the closed surface 4A areassumed to be set in the site A as shown in the drawing.

Here, as shown in FIG. 24, speakers 2A disposed on the closed surface4A′ are set as measurement speakers of an A series. In addition, mics 5Adisposed on the closed surface 1A′ are set as measurement mics of a Bseries. Note that, as described so far, speakers 2A disposed on theoriginal closed surface 4A are set as a Q series, and mics 5A disposedon the original closed surface 1A are set as an R series.

In this case, since the closed surface 4A′ and the closed surface 4Ahave different sizes and shapes, the numbers of disposed speakers 2A arenot the same. While the number of speakers 2A disposed on the originalclosed surface 4A is M as described above, the number of speakers 2Adisposed on the closed surface 4A′ is set to K.

Likewise, since the closed surface 1A′ and the closed surface 1A havedifferent sizes and shapes, the numbers of disposed mics 5A are not thesame, and while the number of mics 5A disposed on the original closedsurface 1A is N as described above, the number of mics 5A disposed onthe closed surface 4A′ is set to L.

In this case, M mics 5B of a V series are disposed on the closed surface4B, and N speakers 2B of W series are disposed on the closed surface 1Bin the site B.

On this premise, in order to realize proper sound field reproduction ofTechnique 1, acoustic signals that must be output from each of thespeakers 2B may be obtained by performing an arithmetic operationaccompanied with conversion of a transfer function as shown by followingExpression 3.

$\begin{matrix}{\mspace{79mu} \left\lbrack {{Math}\mspace{14mu} 3} \right\rbrack} & \; \\{\begin{pmatrix}W_{1} \\W_{2} \\\vdots \\W_{N}\end{pmatrix} = {\begin{pmatrix}{BR}_{11} & {BR}_{21} & \ldots & {BR}_{L\; 1} \\{BR}_{12} & \; & \; & {BR}_{L\; 2} \\\vdots & \; & \; & \vdots \\{BR}_{1N} & \ldots & \; & {BR}_{L\; N}\end{pmatrix}\begin{pmatrix}{AB}_{11} & {AB}_{21} & \ldots & {AB}_{K\; 1} \\{AB}_{12} & \; & \; & {AB}_{K\; 2} \\\vdots & \; & \; & \vdots \\{AB}_{1L} & \ldots & \; & {AB}_{KL}\end{pmatrix}\begin{pmatrix}{QA}_{11} & {QA}_{21} & \ldots & {QA}_{M\; 1} \\{QA}_{12} & \; & \; & {QA}_{M\; 2} \\\vdots & \; & \; & \vdots \\{QA}_{1K} & \ldots & \; & {QA}_{ML}\end{pmatrix}\begin{pmatrix}V_{1} \\V_{2} \\\vdots \\V_{M}\end{pmatrix}}} & \left\lbrack {{Expression}\mspace{14mu} 3} \right\rbrack\end{matrix}$

In Expression 3, however, AB₁₁ to AB_(KL) indicate transfer functionsfrom the respective positions of the speakers of the A series (Al to AK)to the respective positions of the mics of the B series (B1 to BL). Thetransfer functions AB₁₁ to AB_(KL) are measured from the results of thesequential outputs of measurement sounds at each of the positions of thespeakers (at K spots in this case) and sequential collection of thesounds by each of the mics 5A (L mics in this case) in the measurementenvironment, like the above transfer functions QR₁₁ to QR_(MN).

In addition, in Expression 3, BR₁₁ to BR_(LN) indicate transferfunctions from the respective positions of the mics of the B series (B1to BL) to the respective positions of the mics of the R series (R1 toRN).

The transfer functions BR₁₁ to BR_(LN) can be measured in apredetermined environment, for example, an anechoic chamber or the like,without actually constructing the closed surface 1A′ and the closedsurface 1A that are in the positional relation shown in the drawing inthe site A serving as the measurement environment. Specifically, whenclosed surfaces having the same sizes and shapes as the closed surface1A′ and the closed surface 1A are respectively set as a closed surface 1a′ and a closed surface 1 a, the closed surface 1 a′ and the closedsurface 1 a are set in the same positional relation as the closedsurface 1A′ and the closed surface 1A shown in the drawing in, forexample, an anechoic chamber, then measurement sounds are sequentiallyoutput from speakers from each of the positions (B1 to BL) of the Bseries as the closed surface 1 a′, and then the transfer functions canbe measured from results obtained by sequentially collecting the soundswith the mics disposed at each of the positions (R1 to RN) of the Rseries as the closed surface 1 a.

In addition, in Expression 3, QA₁₁ to QA_(MK) indicate transferfunctions from the respective positions of the speakers of the Q series(Q1 to QM) to the respective positions of the speakers of the A series(A1 to AK).

The transfer functions QA₁₁ to QA_(MK) can also be measured in, forexample, an anechoic chamber or the like. Specifically, when closedsurfaces having the same sizes and shapes as the closed surface 4A andthe closed surface 4A′ are respectively set as a closed surface 4 a anda closed surface 4 a′, the closed surface 4 a and the closed surface 4a′ are set in the same positional relation as the closed surface 4A andthe closed surface 4A′ as shown in the drawing in, for example, ananechoic chamber, then measurement sounds are sequentially output fromthe speakers at each of the positions (Q1 to QM) of the Q series as theclosed surface 4 a, and then the transfer functions can be measured fromresults obtained by sequentially collecting the sounds using micsdisposed at each of the positions (A1 to AK) of the A series as theclosed surface 4 a′.

As described above, by measuring the group of transfer functions fromthe Q series to the A series and the group of transfer functions fromthe B series to the R series separately, even when the sizes and shapesof the closed surfaces differ in the measurement environment and thereproduction environment, the transfer functions obtained in themeasurement environment can be appropriately converted, and thusappropriate sound field reproduction can be realized.

Note for the sake of clarification that Expression 3 described abovemeans that appropriate sound field reproduction can be realized evenwhen the number of mics and speakers to be used in a reproductionenvironment and a measurement environment are different. As an extremecase, for example, even when a headphone device of two channels of L/Rin a reproduction environment is used, by performing measurement of thegroup of transfer functions from the Q series to the A series and thegroup of transfer functions from the B series to the R series in thesame manner as described above, the group of transfer functions obtainedin the measurement environment is converted using the group of transferfunctions as in Expression 3, and thereby a sound field can be realized.

Here, although the group of first transfer functions necessary forrealizing Technique 1 has been described above, even for the group ofsecond transfer functions used in Technique 2, it is possible to resolvethe case in which the size and shape of a closed surface are differentin a measurement environment and a reproduction environment byconverting the group of the transfer functions obtained in themeasurement environment based on the same principle.

A specific technique thereof is also disclosed in JP 4775487B based on aproposal of the present inventors; however, for the sake ofclarification, an overview of the technique will be describedhereinbelow. The description will be provided with reference to FIG. 11above.

In the reproduction environment (site B), for example, it is assumedthat only a closed surface (denoted by a closed surface 1A′, forexample) that is smaller than the closed surface 1A shown in FIG. 11 canbe set. In this case, the closed surface 1A is set as the Q series (Mspots from Q1 to QM), and the closed surface 1A′ is set as a P series (Jspots from P1 to PJ).

If there is one spot at which a given sound source S is desired to belocalized, for example, transfer functions measured in the site A whichis a measurement environment of this case are transfer functions fromthe position to the respective positions of the mics of Q1 to QM. Thetransfer functions are set as Q₁ to Q_(M). If the closed surface of themeasurement environment and the closed surface of the reproductionenvironment have the same sizes and shapes, proper sound fieldreproduction is possible by processing the sound source S with thetransfer functions Q₁ to Q_(M).

In this case, the group of the transfer functions from the Q series tothe P series are measured under an environment, for example, an anechoicchamber or the like in association with a case in which the closedsurface 1A and the closed surface 1A′ have different sizes and theshapes. Specifically, the closed surface 1A and the closed surface 1A′are set in an anechoic chamber, measurement sounds are sequentiallyoutput from the speakers at each of the positions (Q1 to QM) of the Qseries as the closed surface 1A, then the transfer functions QP₁₁ toQP_(MJ) are measured from the results obtained by sequentiallycollecting the sounds using the mics disposed at each of the positions(P1 to PJ) of the P series as the closed surface 1A′.

Moreover, acoustic signals (X₁ to X_(J)) that must be output from Jspeakers (X1 to XJ) which are disposed in the reproduction environmentare obtained using the following Expression 4.

$\begin{matrix}\left\lbrack {{Math}\mspace{14mu} 4} \right\rbrack & \; \\{\begin{pmatrix}X_{1} \\X_{2} \\\vdots \\X_{J}\end{pmatrix} = {\begin{pmatrix}{QP}_{11} & {QP}_{21} & \ldots & {QP}_{M\; 1} \\{QP}_{12} & \; & \; & {QP}_{M\; 2} \\\vdots & \; & \; & \vdots \\{QP}_{1J} & \ldots & \; & {QP}_{MJ}\end{pmatrix}\begin{pmatrix}Q_{1} \\Q_{2} \\\vdots \\Q_{M}\end{pmatrix}S}} & \left\lbrack {{Expression}\mspace{14mu} 4} \right\rbrack\end{matrix}$

In this manner, it is also possible to resolve the case in which theclosed surfaces have different sizes and shapes in the measurementenvironment and the reproduction environment (the number of mics in themeasurement environment is different from the number of speakers in thereproduction environment) in Technique 2.

6-4. Measurement Technique Using Moving Objects

In order to realize a reproduction operation as an embodiment, it isdesirable to perform measurement of transfer functions in many places.This is so in order to increase places that can be reproduced.

Using a moving object such as a vehicle on which a speaker or a mic ismounted is effective for efficiently measuring transfer functions inmany places.

Hereinbelow, an example of a measurement technique using a moving objectwill be described.

FIG. 25 is an illustrative diagram regarding Measurement example 1 inwhich a moving object is used.

In Measurement example 1, in a vehicle 60 on which a plurality ofspeakers 2A and a plurality of mics 5A are mounted, transfer functionsare measured as shown in FIG. 25A. The plurality of speakers 2A and theplurality of mics 5A in disposition shown in FIG. 6 above are mounted onthe vehicle 60. Measurement example 1 is mostly favorable for measuringthe first transfer functions necessary for Technique 1.

By repeating measurement and movement using the vehicle 60 as describedabove, transfer functions are sequentially acquired in each place.

FIG. 25B exemplifies the content of a database of the transfer functionsmeasured in Measurement example 1.

In the database, transfer function IDs, sound emission positions, soundreception positions, measurement dates and times, and data (impulseresponse measurement data) are associated with each other as shown inthe drawing. In this case, for the information of the sound emissionpositions, position information of a Global Positioning System (GPS)reception device mounted on the vehicle 60 is used. In addition,identification numbers of the mics 5A mounted on the vehicle 60 are setas the information of the sound reception position of this case.

FIG. 26 is an illustrative diagram regarding Measurement example 2 inwhich a moving object is used.

As shown in FIG. 26A, a plurality of mics 5A are installed on the streetin a fixed or semi-fixed manner in Measurement example 2. Asinstallation positions of the mics 5A on the street, for example, aground surface, a utility pole, a wall, a sign, and the like can beexemplified. In addition, installing a mic on a surveillance camera andthe like is also considered.

In this case, as a moving object, the vehicle 60 that is used inMeasurement example 1 (on which the speakers 2A and the mics 5A aremounted) is also used.

With the mics 5A installed on the vehicle 60, the first transferfunctions can be measured.

To measure the second transfer functions in this case, measurementsounds emitted from the speakers 2A installed on the vehicle 60 arereceived by the mics 5A installed on the street (the mics A installed onthe vehicle 60 may also be used). Since many mics 5A are installed onthe street in Measurement example 2, many transfer functions can beobtained in one measurement.

By storing the many transfer functions measured in this way in adatabase as shown in FIG. 26B, a necessary transfer function can beappropriately selected therefrom and used later.

A difference of the database shown in FIG. 26B from the database shownin FIG. 25B above is that the information of the sound receptionpositions is set as absolute position information. This facilitatesspecification of a positional relation between sound emission positionseach time a necessary transfer function is selected from the database.

FIG. 27 is an illustrative diagram regarding Measurement example 3 andMeasurement example 4 in which moving objects are used.

Measurement examples 3 and 4 are those in which a plurality of movingobjects are used.

In Measurement example 3 shown in FIG. 27A, the vehicle 60, a vehicle 61ahead of the vehicle 60, and a vehicle 62 behind the vehicle 60 are usedas the moving objects.

Here, when vehicles are used as moving objects, the vehicles are drivenon a road particularly in measurement on a street. In this case, it isdifficult to fixedly install mics 5A on the road, and if only onevehicle is used, formation of blank segments ahead of and behind thevehicle in which transfer functions are not measured is a concern. Inthe Measurement examples 3 and 4, such a blank segment can be filled.

In Measurement example 3 as shown in FIG. 27A, only mics 5A rather thanspeakers 2A are set to be installed in the foremost vehicle 61 and therearmost vehicle 62. In this example, the database as shown in FIG. 26Babove is constructed including the positions of the mics 5A (soundreception positions) on the vehicles 61 and 62.

In addition, in Measurement example 4 of FIG. 27B, a vehicle 63 on whichonly the speakers 2A are mounted is set to be used instead of thevehicle 60 in Measurement example 3 shown in FIG. 27A.

In this case, the first transfer functions are measured using the mics5A on the street and the mics 5A on the vehicles 61 and 62.

In addition, with respect to the second transfer functions of this case,many transfer functions can be measured at a time using the mics 5A onthe street and the mics 5A on the vehicles 61 and 62.

Here, when a plurality of vehicles are used as in Measurement examples 3and 4, by using different distances, directions, and the like of theplurality of vehicles of each case, transfer functions can also beobtained in combinations of more sound emission positions and soundreception positions.

Note that, in measurement using a vehicle, collecting sounds while thevehicle is not stopped but moving is also assumed. In this instance, byalso recording a vehicle moving speed at the time of sound collection ina database, the Doppler effect can be subsequently reduced throughsignal processing.

In addition, when the mics 5A are provided on the street, if they aredirectional mics, it is very difficult to change the direction ofdirectivity thereof after installation, and thus a degree of freedom inmeasurement is accordingly hampered. Considering this point, bypreparing the mics 5A installed on the street as omni-directional mics,directivity can be changed through the process of a mic array describedabove. Accordingly, a degree of freedom in measurement can be enhanced,which is very effective for obtaining transfer functions in morepatterns.

6-5. Other Modified Examples

Herein, the following modified examples to the present technology arepossible.

In the above description, the case in which the object-separated soundsource is used for the sound field reproduction of Technique 2 has beenexemplified; however, processes such as noise removal, or reverberationsuppression can also be implemented for sound collection signals of themics 5B in sound field reproduction of Technique 1.

Here, in Technique 1, sounds for sound field reproduction are outputfrom the speakers 2B which are disposed in the site B. At this moment,the mics 5B which collects sound produced by the user 0 are disposedrelatively close to the speakers 2B in the site B, and the sounds fromthe speakers 2B are collected by the mics 5B for sound fieldreproduction. This means that, whereas a process using the firsttransfer functions must be originally performed only on sounds emittedby the user 0, the process using the first transfer functions isperformed on a sound to which sounds for sound field reproduction areadded.

Thus, by performing the same process of noise removal or reverberationsuppression as for the object-separated sound source on the soundcollection signals of the mics 5B as described above, components of thesounds emitted from the user 0 are extracted. In other words, theprocess using the first transfer functions is performed on anobject-separated sound source in this way. Accordingly, in the soundfield reproduction of Technique 1, S/N can be enhanced and quality ofsound field reproduction can be further improved.

Note that the above-described process of noise removal or reverberationsuppression may be set to be performed between, for example, the ADC andamplifying unit 32 and the addition units 33 in the configuration shownin FIG. 12 above.

In addition, the above description has been provided on the premise thatone image is displayed corresponding to one place; however, differentimages for, for example, respective time zones can also be displayed.For example, a plurality of images are photographed and stored forrespective time zones of a reproduction target place. Among the images,an image of, for example, a time zone according to current timeinformation timed by the signal processing device 30 placed in areproduction environment, or a time zone according to a current time ofa reproduction target place (which is obtained from, for example,calculation of a current time timed by the signal processing device 30)is selected and displayed. Alternatively, an image of an arbitrary timezone designated by the user 0 may be selected and displayed.

Note that reproduction according to a time zone described above can alsobe applied to sound field reproduction of Technique 2. Specifically, aplurality of object-separated sound sources of respective time zones areprepared for one place, and a sound source of a time zone according to acurrent time of a reproduction environment or a reproduction targetplace, or an arbitrary time zone designated by the user 0 is output as areproduction sound.

By realizing reproduction according to the time zone in this way, asense of presence can be further heightened.

In addition, in the above description, the case in which reproduction ofa place based on position information designated on a map is performedhas been exemplified; however, information of a current positiondetected on, for example, the GPS may be used as designated positioninformation. In other words, reproduction is performed for a place thatis specified from current position information detected on the GPS.

This is favorable for a system in which, for example, a calling partnerof the user 0 who is in a reproduction environment is present in aremote place and a sound field of the place in which the calling partneris located is reproduced. In this case, current position informationdetected by, for example, a mobile telephone or the like used by thecalling partner is transmitted to the server device 25 and the serverdevice 25 specifies a corresponding place based on the current positioninformation.

In addition, although the case in which measurement is performed usingTSP signals as measurement signals has been exemplified in the abovedescription, measurement may be performed using an M series instead.

In addition, when the system in which many transfer functions aremeasured in combination of various sound emission positions and soundreception positions on a street as shown in FIGS. 26 and 27 above and anecessary transfer function is selected therefrom and used later isassumed, there are cases in which data of the necessary transferfunction is not included in the database. When a necessary transferfunction is not included in a database in this way, the necessarytransfer function can be estimated by performing interpolation withother present transfer functions.

In addition, when the mics 5A are installed on the street in a fixed orsemi-fixed manner, sounds of the reproduction target place may becollected using the mics 5A in real time, transmitted to the signalprocessing device 30 of the reproduction environment via the network 26,and then output from the speakers 2B.

Additionally, the present technology may also be configured as below.

(1)

A signal processing device including:

a display control unit configured to cause a necessary display unit todisplay an image that corresponds to a place specified from designatedposition information;

a sound collection signal input unit configured to input a soundcollection signal of a sound collection unit that collects a soundproduced by a user with a plurality of microphones disposed to surroundthe user;

an acoustic signal processing unit configured to perform a firstacoustic signal process for reproducing a sound field in which the soundproduced by the user is sensed as if the sound were echoing in the placespecified from the position information on the signal input by the soundcollection signal input unit, based on a first transfer function that ismeasured in the place specified from the designated position informationto indicate how a sound emitted on a closed surface inside the placeechoes in the place and then is transferred to the closed surface side;and

a sound emission control unit configured to cause a sound that is basedon the signal that has undergone the first acoustic signal process bythe acoustic signal processing unit to be emitted from a plurality ofspeakers disposed to surround the user.

(2)

The signal processing device according to (1), further including:

an addition unit configured to add an acoustic signal that is based on asound source recorded in the place specified from the designatedposition information to the signal that has undergone the first acousticsignal process.

(3)

The signal processing device according to (2),

wherein the sound source is set to be an object-decomposed sound source,and

wherein the addition unit adds an acoustic signal, obtained byperforming a second acoustic signal process for causing a sound that isbased on the sound source to be perceived as if the sound were beingemitted in the place that is a sound field reproduction target, on theacoustic signal based on the sound source based on a second transferfunction that is measured in the place specified from the designatedposition information to indicate how a sound emitted from the outside ofthe closed surface inside the place is transferred to the closed surfaceside, to the signal that has undergone the first acoustic signalprocess.

(4)

The signal processing device according to any one of (1) to (3), whereinthe acoustic signal processing unit adds a necessary acoustic signal tothe sound collection signal that has not yet undergone the firstacoustic signal process.

(5)

The signal processing device according to any one of (1) to (4), whereinthe acoustic signal processing unit performs the first acoustic signalprocess that is based on the first transfer function on a sound sourcethat is obtained by object-decomposing the sound collection signal.

(6)

The signal processing device according to any one of (1) to (5),

wherein the first transfer function measured for each place that is asound field reproduction target is stored in an external device, and

wherein an acquisition unit configured to acquire a transfer function tobe used by the acoustic signal processing unit in the first acousticsignal process from the external device based on the designated positioninformation is further provided.

(7)

The signal processing device according to any one of (3) to (6),

wherein the object-decomposed sound source and the second transferfunction of each place that is a sound field reproduction target arestored in an external device,

wherein a rendering unit configured to execute the second acousticsignal process is further provided,

wherein an acquisition unit configured to acquire the second transferfunction and an acoustic signal that is based on the object-decomposedsound source to be used in the second acoustic signal process by therendering unit from the external device, based on the designatedposition information is further provided, and

wherein the addition unit adds the acoustic signal obtained by therendering unit performing the second acoustic signal process based onthe acoustic signal and the second transfer function acquired by theacquisition unit, to the signal that has undergone the first acousticsignal process.

(8)

The signal processing device according to any one of (3) to (6),

wherein a rendering unit that executes the second acoustic signalprocess is provided in an external device,

wherein an acquisition unit configured to acquire the acoustic signalobtained by performing the second acoustic signal process by theexternal device is further provided, and

wherein the addition unit adds the acoustic signal acquired by theacquisition unit to the signal that has undergone the first acousticsignal process.

REFERENCE SIGNS LIST

0 user

1A, 1B, 4A, 4B closed surface (acoustic closed surface)

2A, 2B speaker

3 display device

5A, 5B microphone

10 measurement device

11-1 to 11-M, 12-1 to 12-N, 39-1 to 39-N, 43 terminal unit

13, 32 ADC and amplifying unit

14 transfer function measurement unit

15, 40 control unit

16 measurement signal output unit

17, 38 DAC and amplifying unit

18 selector

19 signal component decomposition processing unit 19

20, 21 multiplication unit

22, 31-1 to 33-M, 37-1 to 37-N, 51-1 to 51-N, 55-1, 55-2, 57-1, 57-2addition unit

25 server device

26 network

30 signal processing device

34, 36 howling control and echo cancellation unit

41 operation unit

42 display control unit

44 communication unit

45 memory

46 reference sound replay unit

47, 52 rendering unit

50-11 to 50-1N, 50-21 to 50-2N, 50-M1 to 50-MN filter

53 output control unit

54-11 to 54-13, 54-21 to 54-23, 56-11 to 56-13, 56-21 to 56-23 delaycircuit

1-9. (canceled)
 10. A signal processing device comprising: a soundcollection signal input unit configured to input a sound collectionsignal of a sound collection unit that collects a sound produced by auser at the user's location with more than two microphones disposed atmore than two positions and in a circular arrangement to surround theuser; an acoustic signal processing unit configured to perform a firstacoustic signal process on the signal input by the sound collectionsignal input unit for reproducing a sound field, in which the soundproduced by the user is sensed as if the sound were echoing in a placespecified from designated position information, based on a firsttransfer function that is measured in the place specified from thedesignated position information to indicate how a sound emitted on aclosed surface inside the place echoes in the place and then istransferred to the closed surface side; and a sound emission controlunit configured to cause a sound that is based on the signal that hasundergone the first acoustic signal process by the acoustic signalprocessing unit to be emitted from more than two speakers disposed atmore than two positions and in a circular arrangement to surround theuser; wherein the sound emission control unit suppresses at least one ofnoise and reverberation signals.
 11. The signal processing deviceaccording to claim 10, further comprising: an addition unit configuredto add an acoustic signal that is based on a sound source recorded inthe place specified from the designated position information to thesignal that has undergone the first acoustic signal process.
 12. Thesignal processing device according to claim 11, wherein the sound sourceis set to be an object-decomposed sound source, and wherein the additionunit adds an acoustic signal, obtained by performing a second acousticsignal process for causing a sound that is based on the sound source tobe perceived as if the sound were being emitted in the place that is asound field reproduction target, on the acoustic signal based on thesound source based on a second transfer function that is measured in theplace specified from the designated position information to indicate howa sound emitted from the outside of the closed surface inside the placeis transferred to the closed surface side, to the signal that hasundergone the first acoustic signal process.
 13. The signal processingdevice according to claim 10, wherein the acoustic signal processingunit adds an acoustic signal to the sound collection signal that has notyet undergone the first acoustic signal process.
 14. The signalprocessing device according to claim 10, wherein the acoustic signalprocessing unit performs the first acoustic signal process that is basedon the first transfer function on a sound source that is obtained byobject-decomposing the sound collection signal.
 15. The signalprocessing device according to claim 10, wherein the first transferfunction measured for each place that is a sound field reproductiontarget is stored in an external device, and wherein an acquisition unitconfigured to acquire a transfer function to be used by the acousticsignal processing unit in the first acoustic signal process from theexternal device based on the designated position information is furtherprovided.
 16. The signal processing device according to claim 12,wherein the object-decomposed sound source and the second transferfunction of each place that is a sound field reproduction target arestored in an external device, wherein a rendering unit configured toexecute the second acoustic signal process is further provided, whereinan acquisition unit configured to acquire the second transfer functionand an acoustic signal that is based on the object-decomposed soundsource to be used in the second acoustic signal process by the renderingunit from the external device, based on the designated positioninformation is further provided, and wherein the addition unit adds theacoustic signal obtained by the rendering unit performing the secondacoustic signal process based on the acoustic signal and the secondtransfer function acquired by the acquisition unit, to the signal thathas undergone the first acoustic signal process.
 17. The signalprocessing device according to claim 12, wherein a rendering unit thatexecutes the second acoustic signal process is provided in an externaldevice, wherein an acquisition unit configured to acquire the acousticsignal obtained by performing the second acoustic signal process by theexternal device is further provided, and wherein the addition unit addsthe acoustic signal acquired by the acquisition unit to the signal thathas undergone the first acoustic signal process.
 18. The signalprocessing device according to claim 10, wherein each microphone of themore than two microphones comprises a directional microphone with adirection of directivity facing an inward direction, and each speaker ofthe more than two speakers comprises a directional speaker with adirection of sound emission facing the inward direction.
 19. A signalprocessing method using a sound collection unit that collects a soundproduced by a user with more than two microphones disposed at more thantwo positions and in a circular arrangement to surround the user, and asound emission unit that performs sound emission with more than twospeakers disposed at more than two positions and in a circulararrangement to surround the user, the method comprising: an acousticsignal processing procedure in which a first acoustic signal process isperformed on a sound collection signal of the sound collection unit forreproducing a sound field, in which a sound produced by the user issensed as if the sound were echoing in a place specified from designatedposition information, based on a first transfer function that ismeasured in the place specified from the designated position informationto indicate how a sound emitted from a closed surface side inside theplace echoes in the place and then is transferred to the closed surfaceside; and a sound emission control procedure in which a sound that isbased on the signal that has undergone the first acoustic signal processin the acoustic signal processing procedure is caused to be emitted fromthe sound emission unit; wherein the sound emission control proceduresuppresses at least one of noise and reverberation signals.
 20. At leastone non-transitory computer-readable storage medium storingprocessor-executable instructions that, when executed by at least onehardware processor, cause the at least one hardware processor to performa signal processing method using a sound collection unit that collects asound produced by a user with more than two microphones disposed at morethan two positions and in a circular arrangement to surround the user,and a sound emission unit that performs sound emission with more thantwo speakers disposed at more than two positions and in a circulararrangement to surround the user, the method comprising: an acousticsignal processing procedure in which a first acoustic signal process isperformed on a sound collection signal of a sound collection unit forreproducing a sound field, in which a sound produced by the user issensed as if the sound were echoing in a place specified from designatedposition information, based on a first transfer function that ismeasured in the place specified from the designated position informationto indicate how a sound emitted from a closed surface side inside theplace echoes in the place and then is transferred to the closed surfaceside; and a sound emission control procedure in which a sound that isbased on the signal that has undergone the first acoustic signal processin the acoustic signal processing procedure is caused to be emitted froma sound emission unit; wherein the sound emission control proceduresuppresses at least one of noise and reverberation signals.